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Abstract. We observe returns of a simple random walk on a finite graph to 
a fixed node, and would like to infer properties of the graph, in particular 
properties of the spectrum of the transition matrix. This is not possible in 
general, but at least the eigenvalues can be recovered under fairly general 
conditions, e.g. when the graph has a node-transitive automorphism group. 
The main result is that by observing polynomially many returns, it is possible 
to estimate the spectral gap of such a graph up to a constant factor. 



1. Introduction 

A spelunker has an accident in the cave. His lamp goes out, he cannot move, all 
he can hear is a bat flying by every now and then on its random flight around the 
cave. What can he learn about the shape of the cave? 

In other words: What can we learn about the structure of a finite graph using 
only information obtained by observing the returns of a random walk on the graph 
to this node? 

Let G — (V, E) be a connected simple graph with n = \ V\ > 1 vertices, and let 
r G V be a fixed node. Let Wo = r, W±,W2, ■ ■ ■ be the steps of a simple random walk 
on G starting from r. Assume that we observe the return time sequence, the infinite 
sequence of (random) times < T\ < Ti < . . . when the walk visits r. Alternatively 
this can be described as a sequence 01,02,(13,... of bits, where = 1 if the walk is 
at r at time i, otherwise. Note that T% — Ti, T3 — T2, . . . are independent samples 
from the same distribution as Ti, which we call the return distribution of G to r. 

We say that a parameter p(G, r) of the graph G and root r can be reconstructed 
(from the return time sequence), if for every two rooted graphs (G, r) and (G", r') 
for which the return time sequence has the same distribution, we have p(G, r) — 
p{G',r>). 

Which graph parameters can be reconstructed from the return time sequence? 
There is a trivial way to construct different graphs with the same return sequence: 
take two isomorphic copies and glue them together at the root. Sometimes it makes 
sense to assume that we also know the degree d(r) of the root. In this case, we can 
reconstruct the number of edges through 

\E\ = d(r)E(T 1 )/2. (1) 

If the graph is regular, then we can reconstruct the number of nodes: 

n=\V\ = E(T 1 ). (2) 

Another trivial example is to observe if all the numbers Ti are even. This is so 
if the graph is bipartite, and it happens with probability otherwise. 
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A natural candidate for a reconstructiblc quantity is the spectrum of the transi- 
tion matrix M of the random walk on G. Let Ai = 1, A2, A n be the eigenvalues 
of M, arranged in decreasing order. Bipartiteness is equivalent to saying that 
A„ = -1. 

We are going to show by a simple example that the spectrum is not recon- 
structible in general. On the other hand, we show that if A is an eigenvalue of G 
which has an eigenvector v S M. v such that v r ^ 0, then A is reconstructible. We 
note that the multiplicity of A is not necessarily reconstructible. 

A special case where the eigenvector condition above is satisfied for all eigenvalues 
is when G is node-transitive. We don't know whether in this case the multiplicities 
are reconstructible. 

Of particular interest is the issue of efficient reconstruction, by which we mean 
observing a polynomial (or expected polynomial) number of returns. We consider 
this question in the case of the spectral gap r — 1 — A2 . Assuming the graph is node 
transitive, we describe a procedure to estimate r up to a constant factor, using just 
polynomially many (in n) of the first values of the T^. We give an example of a 
graph where the spectral gap cannot be recovered at all from observations made at 
one particular node. 

This question was first mentioned, together with other related problems, in 2 . 
Another related work is that of Feige |3j which presents a randomized space-efficient 
algorithm that determines whether a graph is connected. His method uses return 
times of random walks to estimate the size of connected components. 

2. Examples 

Example 1. Consider the two trees in Figure The distribution of the return time 
to the root is the same in both trees (see later). The eigenvalues of the tree on the 
left are 

1, V3/2, V6/4, 0, 0, 0, 0, 0, - V6/4, - V3/2, -1, 
while the eigenvalues of the tree on the right are 

1, V3/2, V3/2, V6/4, 0, 0, 0, -V6/4, - V3/2, - V3/2, -1. 
Note that the eigenvalues are the same, but their multiplicities are different. 
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Figure 1. Two trees with the same return times but different spectra 
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Example 2. Let T be a tree in which all internal nodes have degree d+1 and which 
has a "root" r such that all leaves are at distance h from the root. We construct a 
graph G by adding a d-regular graph on the leaves. 

For a fixed h and d, all graphs obtained this way are (d + l)-regular graphs, and 
the distribution of the return time to the root is the same in all such graphs. On 
the other hand, graphs obtained this way can have very different properties. If we 
add an expander on the leaves, the graph G will be an expander. (Recall that G is 
a c-expander iff \dS\ > c\S\ for every non empty set of vertices S with \S\ < \G\/2. 
For background on expanders and spectral gap see e.g. @|.) If we connect "twin" 
leaves to each other, and also match up "cousins" to get d new edges at each node, 
then for h > 2 the root will be a cutpoint. For expanders, the eigenvalue gap 
Ai — A2 is bounded from below by a positive function of d, while for the graphs 
with outpoints in the middle the eigenvalue gap tends to as h — ► 00. 

3. Preparation: some algebra and generating functions 

3.1. Return probabilities and eigenvalues. Denote by Pk(x, y) the probability 
that a simple random walk on G starting at x € V will be at y S V at time k. 
Clearly 

Pk{x,y) = e T x M k e y . (3) 

Here M is not symmetric, but we can consider the symmetrized matrix N — 
DA1D" 1 , where D is a diagonal matrix with the positive numbers \J d(i) in the 
diagonal. The matrix N has the same eigenvalues as M, and so we have 

n 

P fc (r,r)=^/ i (r) 2 A l fc , (4) 

i=l 

where /1, /a, f n is an orthonormal basis of eigenfunctions of N corresponding to 
the eigenvalues Ai, A2, A n . 

We note that if the graph is node-transitive, then the value (r, r) is the same 
for all r, and hence by averaging we get the simpler formula 

n 

P k ( r , r ) = -trace(M fc ) = - V A*\ (5) 
n n ^ — ' 

i=l 

At some point, it will be convenient to consider the lazy version of our chain, 
i.e., the Markov chain with transition matrix M' = (1/2) (I + M) (before doing a 
step, we flip a coin to decide if we want to move at all). The observer can easily 
pretend that he or she is watching the lazy version of the chain: after each step, 
he flips a coin in quick succession until he tosses a head, and advances his watch 
by the number of coinflips. The distribution after k lazy steps is easy to compute 
from 0: 

P' k (x,y) = 2- k el(I + M) k e y = 2~ k £ ( k \ e T x Mh v = 2~ fe ]T (^Pjfay). (6) 

The main advantage of the lazy chain is that its eigenvalues are nonnegative. 
Furthermore, for a lazy chain we have 

Ti 

A 2 + • • • + A„ = trace(M) - 1 = - - 1, 
and hence A2 > 1/3 if n > 4. 
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3.2. The generating function of return times. Let us introduce the generating 
function 

oo n 

/W-E^^^ = E^W 2 T37a- (7) 

fc=0 i=l 1 

There are several other useful expressions for f(t); for example, we get from @ 
that 

f(t) = eJ(I-tM)- 1 e r , 
and expressing this in terms of determinants, we get 

where M' is the matrix obtained from M by deleting the row and column corre- 
sponding to the root, and I' is the (n — 1) X (n — 1) identity matrix. 

It will be convenient to do a little algebraic manipulation. The reciprocal of this 
function is also an interesting generating function: 

1 oo 

M -i -X>', m 

where s k = P{T± — k) is the probability that the first return to the root occurs at 
the fc-th step. This function has a root at t = 1, so it makes sense to divide by 
1 — t, to get the analytic function 

1 oo 

Mffl = &"'' <10) 

where 

Zk = i - s k = s k 

j<k j>k 

is the probability that the random walk does not return to the root during the first 
k steps. 



4. Reconstructing nondegenerate eigenvalues 

It is these formulas which form the basis of learning about the spectrum of 
G from the visiting times of the random walk at x, since P k {r,r) is determined 
by the distribution of return times, and can be easily estimated from the visiting 
times (see section EJ. We call an eigenvalue of M nondegenerate if at least one 
of the corresponding eigenfunctions f(x) satisfies f(r) ^ 0. One can see from |@} 
that the non zero nondegenerate eigenvalues are determined by the distribution 
of return times. Using fii r ) 2 = 1 f° r the orthonormal basis we conclude 

that whether zero is a nondegenerate eigenvalue of M is also determined. The 
return time distribution determines f(t) and this can also be used to find the 
nondegenerate eigenvalues: the poles of f(t) are exactly the reciprocals of the non 
zero, nondegenerate eigenvalues of M. Zero is a nondegenerate eigenvalue if and 
only if limt-joo f(t) > 0. Then we get 

Proposition 1. If two rooted graphs have the same return time distribution, then 
they have the same nondegenerate eigenvalues. 
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Let us remark that if G has a node-transitive automorphism group, then every 
eigenvalue of M is nondegenerate. Indeed, every eigenvalue has an eigenvector, 
which does not vanish at some node; by node-transitivity, it also has an eigenvector 
that does not vanish at the root. 

Let us also remark that the multiplicity of a nondegenerate eigenvalue is not 
uniquely determined: is a nondegenerate eigenvalue of both trees in Example 
Q but it has different multiplicities in the two. Furthermore, degenerate eigen- 
values are not determined by the return times: the second largest eigenvalues of 
the transition matrices of the two (d + l)-regular graphs constructed in Example |3 
are different. It follows from Proposition ^ that at least for the second graph, the 
second largest eigenvalue is degenerate. 

5. Trees 

We want to put Example \I\m broader context. For trees, we can simplify the 
generating function a bit: Since trees are bipartite, we have z 2k = Z2k+i, and hence 
it makes sense to divide by t + 1 and then substitute x = t 2 . It will be convenient 
to scale by the degree of the root, and to work with the function 



h G ( x ) = d(r) ^ z 2k x ■ = (11) 

It is easy to see that we did not lose any information here: we have h Gl (x) = h G2 (x) 
for two trees G\ and G 2 if and only if they have the same return time distribution 
and their roots have the same degree. 

For a rooted tree with a single edge, h G (x) = 1. If a rooted tree G is obtained 
by gluing together the roots of two rooted trees G\ and G 2 , then 

h G (x) = h Gl (x) + h G2 (x). (12) 

This is easily seen by conditioning on which tree the random walk starts in. Fur- 
thermore, if we attach a new leaf r' to the root r of a tree G and make this the 
root to get a new rooted tree G', then 

, / \ l + h G (x) 

h G> {x ) = — T 7T ; r ■ (13 

1 + (1 - x)h G {x) 

To see this, consider a walk on G' starting at r' , and the probability z' 2k that it does 
not return to r' in the first 2k steps (k > 1). The first step leads to r; the second 
step has to use a different edge, which has a probability of d(r)/(d(r) + 1). We can 
view the walk now as a random walk on G until it returns to r. The probability 
that this happens after 2j steps is z 2 j_ 2 ~ z 2j- If J > A; then the walk will certainly 
not return to r' in the first 2k steps. If j < k, then we can think of the situation 
as just having made a step from r', and so the probability that we don't return to 
r' in the next 2k — 2j — 1 steps is z' 2k _ 2 j- Hence we get the equation 

/ d(r) I r-i. . 

z 2k = d ^ ~ 1 I Z 2k -2 + 2j,Z2j-2 ~ Z2j)Z 2k -2j 

Multiplying by x k and summing over all k > 0, we get ljT5)). 

These formulas can be verified from the definition of z k . They imply that h G is 
a rational function with integral coefficients. They also provide us with a fast way 
to compute h G , and through this, to verify that the two trees in Example ^ have 
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the same return distribution. But we can get more, a way to generate many such 
pairs. 

Suppose that we find a linear dependence between functions h G for various trees 
G. This can be written as 

a\h Gl H h a k h Gk = hh^ H h b m h G > m 

with some positive integers a\, . . . , ajt, 61, . . . , b m . Now if we glue together the roots 
of ai copies of G\, . . . , ak copies of Gk to get G, and the roots of b\ copies of G[, 
. . . , b m copies of G' m to get G', then by (|12l) we'll have 

h G (x) = h G ,(x). 

We can add a new root to both if we prefer to have an example rooted at a leaf. 

Obviously, we only need to look for trees rooted at leaves. To find such linear 
dependencies, it is natural to find trees for which h G {x) is "simple", namely the 
ratio of two linear functions, and then find three with a common denominator. A 
general example is a tree G = G a ,b of height 3, where the neighbor of the root has 
degree a and has a — 1 neighbors of degree b. We can allow the degenerate cases 
b = 1 (when G is a star rooted at a leaf) and a = 1 (when G is a single edge). It is 
easy to compute that 1 

ab — (b — l)x 
G ab ~ (ab — l)x 

So if we fix a k which is not a prime, and consider trees G = G a ^ with ab = fc, 
they all have the same denominator k — (k — l)x, and so for any three of them their 
functions h G will be linearly dependent. The simplest choice is k — 4, when we get 
the trees Gi^ (a single edge), G2,2 (a path of length 3) and G^i (a 4-star). Simple 
computation shows that 

h GlA — 3h G22 + 2h Gil = 0. 

Gluing these together as described above, and adding a new root for good measure, 
gives the two trees in Example ^ 

Using JHJ) and it is not hard to see that the roots of the numerator of h G (x) 
are the squared reciprocals of the nondegenerate non zero eigenvalues of G, except 
for the trivial nondegenerate eigenvalues ±1. The multiplicities, as we have seen, 
are not necessarily determined by h G . 

Remark. In the special trees constructed above, the squareroots of the root of the 
denominator are exactly the degenerate eigenvalues of G. We don't know if this 
is always so. An interesting open question seems to be whether the degenerate 
eigenvalues are reconstructible for trees. 

6. Effective reconstruction 

In the previous section, we assumed that the exact distribution of the return 
time is known, which is the same as saying that we can observe the random walk 
forever. In this section we are concerned with determining quantities after observing 
a polynomial number of returns. 



Are these the only trees for which Hq has rational numerator and denominator? Can one say 
anything about quadratic? What about depth 4? 
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6.1. Estimating return probabilities. We show that we can estimate Pk(r,r) 
from the observation of polynomially many return times. Fix k and observe the 
returns T\,T%, . . . until the first with Ti x > k; call this period an experiment. 
Call the experiment successful if = k. The probability that an experiment 
is successful is Pk (r, r) . Note that observing the next k steps and then until the 
first return (i.e., T^+i, . . . , Tj 2 with the smallest ii such that Ti 2 > + k) is an 
independent experiment. 

So we have a sequence of independent events with the same probability p = 
Pk{r,r), and we want to estimate p. By standard results, observing pe~ 2 6~ 1 of 
them, the relative frequency will be closer than e to p with probability 1 — 5. 

The amount of time a particular trial takes is a random variable, whose expecta- 
tion is k plus the time it takes to get back to r after k steps. This can be bounded 
by the maximum hitting time between nodes, which is 0(n 3 ). Summing up, 

Proposition 2. In an expected time of 0((k + n 3 )e 2 6~ 1 ) we can compute an esti- 
mate of Pfc(r,r) which is within an (additive) error of e with probability 1 — 5. 

6.2. Reconstructing the eigenvalue gap. We restrict our attention to node- 
transitive graphs, in which case we can use the trace formula (J5J. We can use @ 
to reconstruct the number of nodes n. Furthermore, we assume that the chain is 
lazy, so that its eigenvalues are nonnegative, and their sum is n/2. 

For a lazy chain, Pk{r, r) tends to 1/n monotone decreasing. Furthermore, ifHjl 
implies that setting 

Qk = Pk(r,r) - -, 
n 

we have 

nq k+1 = > -4i ( E A ( E A * ) = -4r(trace(M) - l)nq k , 

and hence 

Qk+i > 3% (14) 

for n > 4 (which we assume without loss of generality). 
We can try to compute recursively Ai = 1 and 

i/fc 



Aj = lim 

k — >oo 



£-1 \k 

Pk(r,r)-J2- 
— ' n 



This, however, does not seem to give an effective means of estimating Ai in poly- 
nomial time. But to estimate at least the eigenvalue gap t = 1 — A2 we can use the 
following fact. 

Lemma 1. We have 

It is not hard to see that these bounds imply the weaker but more informative 
bounds 

ln(l/ft) ^ r < ln(n/g fc ) _ ^ 



k - ]n(l/q k ) 
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1 " \ k 
P k (r,r) = - ^" 



E 

i=2 



and hence 



n n 

i=2 



Thus 

Using the elementary inequality 



l-(nq k ) 1/k <r<l- q l /k . 



1 — x In x 

< 



1-y In y 

valid for < a; < y < 1, (23) follows. □ 

Let c > 1. It follows that if we find an integer k > such that q k < l/n c , then 
1 — qi' k is an estimate for the eigenvalue gap r which is within a factor of 1 + 1/c 
to the true value. But of course we don't know q k exactly, only with an additive 
error: by proposition we can estimate q k in polynomial time with an additive 
error less than (say) e/n c , with high probability. So to get valuable information, 
we need to find a value of k for which q k > e/n c . 

It is well known that the eigenvalue gap of a graph with n nodes is at least l/n 2 , 
so we get that for k > K = (c + l)n 2 Inn, 

q k <n(l-^-] <ne' kl ^ <— r . 



Applying Proposition [21 we can compute an approximation Q k of q k that is 
within an additive error of ej (8n c ) with probability 6/ (log 2 Ko). By binary search, 
we can find a k in the interval [0, Ko] for which Q k < l/n c but Q k -i > l/n c . 

Proposition 3. For the value of k computed above, 1 — Q]/ k is within a factor of 
lie of t with probability at least 1 — 5. 

Proof. With large probability, we have 

s 

\q,n - Qm\ < 7— : 

8n c 

for all m for which we compute Q m , in particular for m = k — 1 and m = k. Using 



qu 



1 1 ( ^ £ \ 1 



and also 
Similarly, 
We claim that 



4n c 

Q k >q k -^- c >{l-\)q k - (17) 



Qfc<(l + |)«*. (18) 
e l-0 1/k e 

l ~2^-^m- l + 2- (19) 

z 1 - q k ' z 
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To show the upper bound, we may assume that Qk < qk- Then using l|18|l . 

l-Ql /k lnQ fc ln((l-§W) In(l-f) , e N 

l-g,T /fe Ingfc mgfc ln?t 22 

The lower bound in (|19fl follows similarly. Hence by Lemma ^ 

r>l-qt /k >(l-s)(l-Ql /k ), 



and 



<(l + ^) (l + l) (l-Ql /k )<(l + e)(l-Ql /k ). 



7. Concluding remarks 



□ 



1. We can estimate for every node-transitive graph, by similar means, the value 
1 — max(A2, | A n |), which governs the mixing time of the chain. The trick is to 
consider the matrix M 2 instead of M, i.e., observe the chain only every other step. 
A little care is in order, since this new chain may not be connected; but by node- 
transitivity, its eigenvalue gap is the eigenvalue gap of the component containing 
the observation node. 

2. The second moment of the first return time also has some more direct meaning. 
Let H (7r, r) denote the expected number of steps before a random walk starting from 
the stationary distribution hits the root r. Then it is not hard to show using that 
the walk is close to stationary at a far away time that 

V ' ' 2E(Ti) 2 

It is not clear whether any of the higher moments have any direct combinatorial 
significance. 

3. Here are a couple of related problems. 

Problem: Let G be a connected graph of size n. We label the vertices randomly 
by m(n) colors and observed the colors as they are visited by a simple random walk 
random walk: after each step, the walker tells you "now I'm at red" , "now at blue" , 
and so on. How many colors are needed in order to recover the shape of G a.s. from 
this sequence of colors? 

Problem: Consider an n-node connected graph. Take n particles labeled 1, n. 
In a configuration, there is one particle at each node. The interchange process 
introduced in yQ is the following continuous time Markov chain on configurations: 
For each edge at rate 1 the particles at i and j interchanged. Assume you 

observed the restriction of the interchange process to a fixed node, what graph 
properties can be recovered? Obviously you get more information than in the case 
discussed in the paper, which corresponds to noticing only one of the particles. But 
is it really possible to use this information to discover more about the graph? 
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